Compositional Translation of Technical Terms by Integrating Patent Families as a Parallel Corpus and a Comparable Corpus

نویسندگان

  • Itsuki Toyota
  • Zi Long
  • Lijuan Dong
  • Takehito Utsuro
  • Mikio Yamamoto
چکیده

In the previous methods of generating bilingual lexicon from parallel patent sentences extracted from patent families, the portion from which parallel patent sentences are extracted is about 30% out of the whole “Background” and “Embodiment” parts and about 70% are not used. Considering this situation, this paper proposes to generate bilingual lexicon for technical terms not only from the 30% but also from the remaining 70% out of the whole “Background” and “Embodiment” parts. The proposed method employs the compositional translation estimation technique utilizing the remaining 70% as a comparable corpus for validating translation candidates. As the bilingual constituent lexicons in compositional translation, we use an existing bilingual lexicon as well as the phrase translation table trained with the parallel patent sentences extracted from the 30%. Finally, we show that about 3,600 technical term translation pairs can be acquired from 1,000 patent families.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of Domain-Specific Corpus in Compositional Translation Estimation for Technical Terms

This paper studies issues on compiling a bilingual lexicon for technical terms. In the task of estimating bilingual term correspondences of technical terms, it is usually quite difficult to find an existing corpus for the domain of such technical terms. In this paper, we take an approach of collecting a corpus for the domain of such technical terms from the Web. As a method of translation estim...

متن کامل

A Comparative Study On Compositional Translation Estimation Using A Domain/topic-Specific Corpus Collected From The Web

This paper studies issues related to the compilation of a bilingual lexicon for technical terms. In the task of estimating bilingual term correspondences of technical terms, it is usually rather difficult to find an existing corpus for the domain of such technical terms. In this paper, we adopt an approach of collecting a corpus for the domain of such technical terms from the Web. As a method o...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Extraction of Bilingual Technical Terms for Chinese-Japanese Patent Translation

The translation of patents or scientific papers is a key issue that should be helped by the use of statistical machine translation (SMT). In this paper, we propose a method to improve Chinese–Japanese patent SMT by premarking the training corpus with aligned bilingual multi-word terms. We automatically extract multi-word terms from monolingual corpora by combining statistical and linguistic fil...

متن کامل

Improving Compositional Translation with Comparable Corpora

We improved the compositional term translation method by using comparable corpora. A bilingual lexicon consisting of pairs of word sequences within terms and their correlations is derived from a bilingual document-aligned corpus. Then, for an input term, compositional translations are produced together with their confidence scores by consulting the corpus-derived bilingual lexicon. Thus, we can...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013